Scalable Distributed Subgraph Enumeration
نویسندگان
چکیده
Subgraph enumeration aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph. As the subgraph isomorphism operation is computationally intensive, researchers have recently focused on solving this problem in distributed environments, such as MapReduce and Pregel. Among them, the state-of-the-art algorithm, TwinTwigJoin, is proven to be instance optimal based on a left-deep join framework. However, it is still not scalable to large graphs because of the constraints in the left-deep join framework and that each decomposed component (join unit) must be a star. In this paper, we propose SEED a scalable subgraph enumeration approach in the distributed environment. Compared to TwinTwigJoin, SEED returns optimal solution in a generalized join framework without the constraints in TwinTwigJoin. We use both star and clique as the join units, and design an effective distributed graph storage mechanism to support such an extension. We develop a comprehensive cost model, that estimates the number of matches of any given pattern graph by considering powerlaw degree distribution in the data graph. We then generalize the left-deep join framework and develop a dynamic-programming algorithm to compute an optimal bushy join plan. We also consider overlaps among the join units. Finally, we propose clique compression to further improve the algorithm by reducing the number of the intermediate results. Extensive performance studies are conducted on several real graphs, one containing billions of edges. The results demonstrate that our algorithm outperforms all other state-of-theart algorithms by more than one order of magnitude.
منابع مشابه
Scalable Subgraph Enumeration in MapReduce
Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent ...
متن کاملDistributed Graph Layout for Scalable Small-world Network Analysis
The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, an...
متن کاملImproved Ise Identification under Hardware Constraint
The three Instruction Set Extension (ISE) enumeration algorithms described in this paper are Subgraph Enumeration (SE), Subgraph Removal (SR), and Lucky Subgraph Removal (LSR). SE exhaustively enumerates all convex subgraphs of a dataflow graph. SR iteratively finds the highest gain subgraph and then locks the related nodes out of the solution space for the next iteration of the search. Finally...
متن کاملScalable distributed service migration via Complex Networks Analysis
With social networking sites providing increasingly richer context, user-centric service creation is expected to follow a similar growth with User-Generated Content. The what-is-often-called User Generated Services paradigm calls for efficient yet scalable solutions for optimally placing service facilities. Typically seen as an instance of the facility location problem, service placement has be...
متن کاملEfficient Enumeration of Induced Subtrees in a K-Degenerate Graph
In this paper, we address the problem of enumerating all induced subtrees in an input k-degenerate graph, where an induced subtree is an acyclic and connected induced subgraph. A graph G = (V,E) is a k-degenerate graph if for any its induced subgraph has a vertex whose degree is less than or equal to k, and many real-world graphs have small degeneracies, or very close to small degeneracies. Alt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2016